Parser Adaptation for Social Media by Integrating Normalization

نویسندگان

  • Rob van der Goot
  • Gertjan van Noord
چکیده

Previous photos and videos This work explores normalization for parser adaptation. Traditionally, normalization is used as separate preprocessing step. We show that integrating the normalization model into the parsing algorithm is beneficial. To this end, we use a normalization model combined with the parsing as intersection algorithm. This way, multiple normalization candidates can be leveraged, which improves parsing performance on social media. We test this hypothesis by modifying the Berkeley parser; outof-the-box it reaches an F1 score of 66.52. Our integrated approach performs significantly better, with an F1 score of 67.36, while using the best normalization sequence results in an F1 score of only 66.94.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Does Size Matter? Text and Grammar Revision for Parsing Social Media Data

We explore improving parsing social media and other web data by altering the input data, namely by normalizing web text, and by revising output parses. We find that text normalization improves performance, though spell checking has more of a mixed impact. We also find that a very simple tree reviser based on grammar comparisons performs slightly but significantly better than the baseline and we...

متن کامل

LexToPlus: A Thai Lexeme Tokenization and Normalization Tool

The increasing popularity of social media has a large impact on the evolution of language usage. The evolution includes the transformation of some existing terms to enhance the expression of the writer’s emotion and feeling. Text processing tasks on social media texts have become much more challenging. In this paper, we propose LexToPlus, a Thai lexeme tokenizer with term normalization process....

متن کامل

What to do about bad language on the internet

The rise of social media has brought computational linguistics in ever-closer contact with bad language: text that defies our expectations about vocabulary, spelling, and syntax. This paper surveys the landscape of bad language, and offers a critical review of the NLP community’s response, which has largely followed two paths: normalization and domain adaptation. Each approach is evaluated in t...

متن کامل

An Unsupervised Text Normalization Architecture for Turkish Language

A variety of applications on the problem of short-text messages require text normalization process that transforms ill-formed words into standard ones. Recently, many successful approaches have been applied to text normalization especially for social media text. Since each natural language has its own difficulties and barriers, we need to design an architecture to normalize short text messages ...

متن کامل

Adapting a general parser to a sublanguage

In this paper, we propose a method to adapt a general parser (Link Parser) to sublanguages, focusing on the parsing of texts in biology. Our main proposal is the use of terminology (identi cation and analysis of terms) in order to reduce the complexity of the text to be parsed. Several other strategies are explored and nally combined among which text normalization, lexicon and morpho-guessing m...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2017